fix: add resource limits, PodDisruptionBudgets, and backup error handling#427
fix: add resource limits, PodDisruptionBudgets, and backup error handling#427
Conversation
Change PDBs from minAvailable:1 to maxUnavailable:1 so single-replica workloads don't block node drains and cluster upgrades. Bump API and Hasura memory limits from 512Mi to 1Gi and CPU from 500m to 1000m to handle NestJS+BullMQ+WebSocket and Hasura subscription load.
| memory: "256Mi" | ||
| cpu: "250m" | ||
| limits: | ||
| memory: "1Gi" |
There was a problem hiding this comment.
for larger installs no. do max 4gb
There was a problem hiding this comment.
Bumped API memory limit to 4Gi in ffa968e.
| memory: "256Mi" | ||
| cpu: "250m" | ||
| limits: | ||
| memory: "1Gi" |
There was a problem hiding this comment.
way too little , in a production instance i have 4 GB. lets lmit to 4gb
There was a problem hiding this comment.
Bumped Hasura memory limit to 4Gi in ffa968e.
Keep CPU requests for scheduling but drop limits so containers aren't throttled under load.
|
Removed CPU limits across all services in 6417828. CPU requests are still set for scheduling so pods land on adequately-sized nodes, but nothing throttles them now. Memory limits: API/Hasura at 4Gi as you specified. Left the others (Redis 256Mi, MinIO 512Mi, TimescaleDB 1Gi, Typesense 512Mi, Web 256Mi) at the initial values — if any of those are too low, let me know what your prod values are and I'll bump. |

Summary
|| truefromapk add, add pg_dump output validation, add S3 upload error checkingAddresses #415 and #416
Test plan